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1. Introduction 

There is a great demand, especially in cellular biology, for precise mathematical tools 
with which to quantify topological structure in large observed networks. Such tools can 
be used to: compare networks; distinguish between meaningful and random structural 
features; and, to define and generate tailored random graphs as null models or network 
proxies. In a previous paper [1] it was shown how a specific family of tailored random 
graph ensembles, with controlled degree distributions and controlled degree-degree 
correlation functions, is well suited for generating such tools. The authors of [1] 
applied techniques from statistical mechanics to calculate explicit formulae for the 
leading orders in the systems size of the Shannon entropy per node for these tailored 
graph ensembles, and related quantities such as complexity and information-theoretic 
distances. Subsequent papers were devoted to the numerical generation of graphs [2] 
from the proposed ensemble families and the application in cellular biology of the 
resulting mathematical tools For an overview see e.g. [1]. The main limitation 
of [T] was that it only dealt with nondirected networks and graphs. In this paper we 
take the next step and develop the corresponding theory for directed ones. 

Extending the methods in jlj to directed networks will enable their application 
to important new problems especially in cellular biology. Other applications could 
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include the analysis and control of communication and computation networks. For 
example, to understand the processes driving a cell it is necessary to go beyond 
studying individual genes; one needs to study their interactions. Information on how 
genes interact within the cell is commonly represented by a directed graph: the gene 
regulation network. High-throughput methods have generated a wealth of data on 
gene regulation. We now need powerful mathematical tools to analyse these data. By 
focussing on which properties are the most important to the structure of the biological 
signalling network, we can envisage being able to postulate mechanisms for how the 
network evolved and came to fulfil its function, and build better models for such 
networks. Evaluating the fit of a network model to network data is often seen as a 
formidable computational challenge [5], which is usually overcome by looking at fit 
based on comparing network properties. Our approach gives a rigorous quantitative 
method for prioritising network properties; this is important as different properties 
might promote different potential models. 

The use of statistical mechanics to quantify the information content of network 
structure is well established; see e.g. O |71 [1]. Most work so far has focused 
on undirected networks. The network properties most frequently studied are degree 
distributions, clustering coefficients, assortativities and path length statistics. There 
has also been research on occurrences of motifs and subgraphs, motivated by the idea 
that if a network favours specific local topological patterns then these might reflect 
common local processes. A particular benefit of the approach followed here and in [1] 
is the compact and explicit nature of the final formulae. Although their derivations are 
involved in places, the final results are compact. They take easily measured topological 
observables as input, avoid the need for numerical simulations or approximations, and 
are easy and efficient to use as our (biological) datasets grow. We therefore imagine 
that this line of research will continue to develop, by adding further macroscopic 
network observables, beyond degree statistics and degree correlation functions. Each 
addition will make the method more powerful and useful. 

The specific quantities calculated in this paper are: the Shannon entropy 
and complexity of directed graph ensembles with controlled degree distributions; 
the Shannon entropy and complexity of directed graph ensembles with controlled 
degree distributions and controlled degree-degree correlation functions; and, the 
symmetrised Kullback-Leibler distance between pairs of such ensembles. For each 
of these we calculate the leading orders in the network size, expressed in terms 
of the controlled degree distributions and degree-degree correlation functions of the 
ensembles concerned. We illustrate the use of our results in section [5] with applications 
to experimental data on gene regulation networks. 

We adopt the following notation conventions. Each directed graph with N nodes 
is defined by a matrix c = {ci^}, with entries Cij € {0, 1} indicating whether (c^ = 1) 
or not {cij = 0) there is a directed arc from node j to node i. For each node i we 
define the so-called in- and out-degrees, viz. A:°"*(c) = J2j^ji ^^'^ K'^i^) = J2j'^ij^ 
in nondirected graphs such as in [T] one would have had fc"(c) = fc°"'(c) for all 
i. We write the pair of degrees at a site i as ki{c) = (A:-"(c), fc°"*(c)). Boldface 
letters will represent ordered sets with N elements, such as fe'" = (fci", . . . , fc]v)) or 
fc-(c) = (A:l"(c),...,fcj^(c)). 
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2. Directed graphs with controlled in- and out-degree distributions 

Here we calculate the Shannon entropy of an ensemble of directed random graphs 
constrained by a common joint distribution of in - and out-degrees. Via suitable 
adaptations of the methods developed for nondirected networks, we achieve a standard 
path- integral form to which we can apply the method of steepest descent. This leads 
to an elegant analytical expression for the entropy of the ensemble in the leading 
orders in N. The key term takes the form of a Kullback-Leibler distance between 
the imposed joint degree distribution and the Poissonnian one that would have been 
found upon generating directed arcs independently. 



2.1. Definition of the problem 

We consider an ensemble of directed random graphs, where degree pairs hi = (fcj", 

are for each node i drawn independently from a specified joint degree distribution p{k): 



p{c) 



p{c\ki. . . /cat) 



[n^(^*) P{c\ki ...kN) 



(2.1) 



(C) 



Z{ki. . . kN) 



z{h...kN) = Y.l[hadc) (2-2) 



For this ensemble we want to find the Shannon entropy per node S — 
_jV~^ ^j,p(c) logp(c), which informs us about the effective number Af = exp{NS) 
of graphs in the ensemble and the complexity of directed graphs with the imposed 
degree statistics p{k). Upon substituting (|2.2I) into the entropy formula, and after 
some simple manipulations and use of the law of large numbers, one finds that the 
entropy per node takes the form 



E [l[pih)]^ogZ{k, . . .k^) -J2p'^''^^''SP{k) + 



eN (2.3) 



where e at — as — > oo. To make the first term in this expression more tractable, we 
transform Z{ki . . . kN) into an average involving an alternative measure. If we denote 
the average degree by fc = iV~^ ^ - fcj" = iV~^ we may define the measure 



"■(elk) = n [p^.,- 



nJ 



k 

'n 



N{N-l)r k/N 



Cii.O 



Nk(C) 



= W{k,k{c)) 



(2.4) 



-l-k/N- 

Since this measure depends on the graph c via fc(c) only, we can write the partition 
function Z[ki . . . kN) in terms of an average over the measure (j2.4p . viz. 

^ ^ Hci'fc)n%.,fc.(c) 



Z{ki 



Cat J 



W{k,k) 



c 



(2.5) 



Introducing the notation {f{c))i^ = X^c ^° represent averages over the 
measure (j2.4l) with average connectivity k, the entropy per node can be written as 



ki ...k]\ 



k 
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(fc)[log(iV/(fc)) + l] +eN 



(2.6) 



with limAT^ooEjv = 0, and with (fc) = J2k^^'^Pi^) ~ J2k^°'^^Pi^)- ^he complexity 
of the problem is thus contained in the first term of (|2.6p : 

(2.7) 



^ E [Upi^'-&{Ilham\ 



ki...kN * 



Entropy evaluation 

Using Fourier representations of the Kronecker deltas in (|2.7p and some 
straightforward manipulations brings us to 



ki...kN * 



L{u;, ^) = exp [fcA^(^E ^"'"0 (^E ^"'"0 ~ + 0{N°)]{2.9) 

i j 

Introducing the quantities i?(u;) = N^^Y^i^^'"^" and ^(t/j) = TV^^ X^i e"''^' , and 
inserting JdRdS 6[R — R{u))]S[S — S{ip)] with (5-functions written in integral form, 
allows us to write 



dmRdSdS_ 

47r2/iV2 ^ 



,7vfi(fli?+S'S)+fc(i?S-l)l+C>(A''') 



(2.10) 



Substituting this back into (f>, using the law of large numbers, then gives 



^ E [Upi^ log J di?di?d^d^ ^NnR,R,S,S) + OilosN) (2.11) 



ki...k]\ 



where 



*(i?, 4 S*, S*) = iiRR+SS) + k{RS-l) + y"p(fc'") log / — e'[' 



^p(fc™')iog r^^[^k-'-s<^-'^\ 



(2.12) 



The average in (|2.1ip over degree sequences is now obsolete since the argument depends 
in leading order in N on their distribution only, and (j2.11l) can be evaluated by steepest 
descent: 



^im (f> — ex\iT g s,^{R, R, S, S\ 
We can simplify 4' by doing the remaining integrals, using 



, 27r 

^ m>0 



ml 
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(2.13) 



(2.14) 
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Hence 

R, S, S) = i{RR+SS) + k{RS-l)+Y, log[(-i-R)''7fc'"!] 
+ ^p(fc°"*)log[(-i^)'=°7fc°"'!] (2.15) 

fcout 

Differentiation of vjf gives the following saddle-point equations: 

-iR = kS, ~iS ^kR (2.16) 

iRR + k = 0, iSS + k = (2.17) 

We conclude that RS — 1, and hence at the saddle-point we have 

^{R, R, S, S) = ^p(fc'") log^^.(fc-) + ^p(fc-*) log7rs(fc°"*) (2.18) 

with the Poissonnian degree distribution 7r^(fc) — c^^k^ /k\. 
2.3. Final analytical expression for the entropy of the ensemble 

The intermediate result p.lSp can now be substituted back into the expression for the 
entropy of the constrained random graph ensemble defined in (|2.6p , giving 

5 = fc[log(jV/fc) + l] - P(fc'"fc°")log( Jfcinw_ J^^ (2-19) 

where k is the average connectivity, N is the number of nodes in the network, 
p(fc™, fc°"*) is its degree distribution that constrained the random graph ensemble, 
and lim^v-ioo Cn = 0. 

The compact form of (|2.19p enables us to interpret and understand this result 
for the entropy per node. For example, we can consider what the result would have 
been if the constraint on the ensemble had been less restrictive. If our ensemble 
was a maximum entropy ensemble on the space of all directed graphs, but now 
constrained by the average degree only (as opposed to the full joint in- and out- 
degree distribution), then the entropy per node would have been S = fc[log(iV/fc) -I- 1]. 
We see that this is identical to what we would obtain from (|2.19p if the constraining 
degree-distribution was p(fc™, fc°"') — 7r(fc'")7r(fc°"*); a trivial calculation confirms that 
in the maximum entropy ensemble with constrained average degree one indeed has 
p(fc'", k°^^) = 7r(fc™)7r(fc°"*) for N oo. Similarly, if we had chosen a maximum 
entropy ensemble of directed graphs constrained by a prescribed degree sequence (as 
opposed to a joint degree distribution), then the entropy would have taken the form 

S = fc[log(A^/fc) + 1] + 51 p(fc7fc™*)log[7r(fc'")7r(fc™')] +Cjv (2.20) 

This value is seen to be simply (12.191) minus the Shannon entropy of the joint degree 
distribution p{k"\k°'^^), reflecting the possible ways to relabel sites in the original 
ensemble; this freedom is removed once we specify the individual degrees rather than 
their distribution. 
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3. Directed graphs with controlled degree distributions and 
degree-degree correlation functions 

We extend our calculation to directed graph ensembles that are constrained further, 
by imposing a degree-degree correlation function in addition to a degree distribution. 
Degree-degree correlations in networks are known to carry valuable information. They 
can give rise to properties such as 'assortativity' or 'disassortativity' and often reflect 
the algorithm responsible for a network's generation. One such algorithm, 'preferential 
attachment', is well illustrated by the World Wide Web, where pages are more likely to 
be 'linked' to if they already have many pages linking to them. Preferential attachment 
models such as [6 gained credibility by reproducing the typical fat tails often found 
in the degree distributions of real networks. 



3.1. Definition of the problem 

We now wish to generate graphs with degree pairs (fcj",fc°"') again drawn 
independently from the distribution p{k) = p(fc'", but now the link probabilities 

are modified by some function Q{ki, kj\p) of the degrees of the nodes concerned, and 
their distribution, with h = {kf", fc°"'): 

pic\p,Q) = [llp{h)]pic\ki...kN,Q) (3.1) 

ki...kN * 

p{c\ki...kN,Q)^ (3.2) 

Z[ki...kN,Q) 

Z{ki ...kN,Q) = ^ w{c\ki ...kN,Q)Y\_ ^k,.k,(c) 

C i 

The difference with the graph ensemble in the previous section is the appearance of a 
new measure w{c\ki . . . k^, Q), defined as 

w{c\ki . . .kN,Q) = Yl [jjQi^tykj\p)Sc,^^i+(l-^Q{ki,kj\p)^Sc,^^o (3.3) 

with Q{ki, kj \p) > for all (fc;, kj), and with the distribution p{k) — k- ^'^'^ 

the average degree k = iV~^ k™ = iV~^ fc°"' of the imposed degree sequence. The 
objective of the measure p.3|) is to deform the graph probabilities such as to impose 
a specific correlation profile between the degrees of connected nodes, by a suitable 
choice of the kernel Q{., .). We take Q(., .) to be normalized such that w{c\ . . .) is 
asymptotically consistent with the average degree k. This means that we demand 
N^^J2ij Qi^i^f^jlP) = 1- Equivalently, '^^j;, p{k)p{k')Q{k,k'\p) — 1, which explains 
why Q{., .) depends on the distribution p. The entropy per node S of our ensemble is 

S ^ -Y,p(c\p,Q)^l{c\p,Q) (3.4) 
c 

n{c\p,Q) ^ N-Hogp{c\p,Q) (3.5) 



3.2. Entropy evaluation 



In [Appendix A| we calculate the quantity p.5p in leading orders in N , resulting in 
formula (|A.23|) . Substitution into expression for the entropy, followed by doing 
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the average over p{c\p, Q) and some simple re-arranging of terms, then gives us 



S ^k[\og{N/k) + l\-^p{k) log 



p{k) 



7rfc(fci")7rfe(fc°"t) 



fc^M^(fc,fc')log 

k,k' 



R{k\p, Q)Q{k,k'\p)S{k'\p,Q) 



Wi{k)W2{k') 



+ C 



N 



(3.6) 



The 



with limjv-rooCw = 0, 7rfc(fc) = e-'^k^kl, and k = Y.kPik)k"' = J2kPik)k° 
kernel W{k,k') and its two marginals Wi,2(fc) in this expression are as defined in 
(jA.8IA.9IA.f op . but now calculated for graphs from our ensemble (|3.ip . Similarly, the 
quantities R{k\p, Q) and Q{k\p, Q) are now solved from 



Rik) = =r- 



p{k)k'' 



p{k)k° 



- ^ (3.7) 

kY.j:,Q{k,k'\p)S{k'y - kY.j:,Q{k',k\p)R{k') 

in which the distribution p{k)^ its associated average fc, as well as the kernel 
Q{k, k'\p), correspond to ensemble p.ip . Thus the correct normalization of the kernel 
Q{-t) is X^fc fe' ^'l-P) = 1- What remains is to express the distribution 
W{k,k'\p,Q) for ensemble p.ip in terms of {p, Q}. This is done in [Appendix B[ 
resulting in ()B.3|) : 

lim W{k, k') = R{k\p, Q)Q{k, k'\p)S{k'\p, Q) 



(3.8) 



in which R{k\p,Q) and S{k\p,Q) are once more the solutions of p.7|) . but now with 
p{k) replaced by p{k). Combination with (j3.6p then gives us 



S = k[\og{N/k) + 1] - ^p(fc) log 



p(fc) 



fc^M/'(A?,fc')log 

fc,fe' 



^fc(fc'")^^.(fc™t) 
W^(fc, A?') 



W^l(fc)VF2(fc') 



■ Cat 



(3.9) 



with limAr_).oo ejv = 0. Compared to the entropy per node (j2.20p of ensembles where 
only the in-out degree distributions are imposed, we see that imposing in addition our 
new constraint, the specific degree-degree correlations as embodied by W{k, fc'), leads 
to a reduction of the entropy by an amount proportional to the mutual information 
of in-out degrees of connected nodes. An analogous result was derived in [T] for 
nondirected graphs. It can immediately be seen that if the in-out degrees of connected 
nodes are statistically independent, then the final nonvanishing term of l3.9l will be zero. 
Hence the entropy of the ensemble will in that case be the same as though the only 
constraint was the degree distribution. 



4. Quantifying structural distance between networks 

4-.1. Derivation of the distance formula 

In this section we define and calculate an information theoretic distance between two 
directed networks A and B, with in-out degree distributions PA{k) and ps(fc) and with 
degree-degree correlation functions T40i(fc,fc') and PVs(fc,fc'). We generalize to the 
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present context of directed graphs the choice made in [1:, viz. the Jeffreys divergence 
(i.e. symmetrized KuUback-Leibler distance) per node of the two associated ensembles 
from our family (j3.ip : 



Dab ^ J^YI {pi^\PA,QA) log 



p{c\pA,Qj 



-p(c|pB,(3B)log 



p{c\pb,Qb)- 
-p{c\pb,Qb) 



(4.1) 



-p{c\pa,Qa) 

Dab is non-negative and equals zero only when both networks A and B belong to 
the same tailored graph ensemble (i.e. have equivalent constraints) . Upon writing the 
Shannon entropies per node of the ensembles A and B as Sa and 5b, we have 

Dab — 7:{Sab + Sba - Saa - Sbb) (4.2) 



where, using the abbreviation 

Sab ^ ~ ^^p{c\pA,QA)^ogp{c\pB,QB) 



^p{c\pa, Qa)^{c\pb,Qb) 



(4.3) 



with i}{c\p,Q) as defined in (13. 5p . We may now use result (|A.23P of [Appendix A[ 
but in doing so it is vital to keep track carefully of the labels {A, B) of the degree 
distributions and kernels. In particular, according to (j4.3p we must make in (jA.23p 
the substitutions p{k\c) PA{k), W{k,k'\c) — ^ WA{k,k'), p{k) — > PB{k), and 
Q{k, k'\p) QB{k, k'\pA)- This leads us to 



lim Sab 



J2PA{k)\0gpB{k)~~kA l + l0g(^)| -J2PA{k)\0g{k'^\k' 



N 



LOUt,N 



^PA(fc)fc'"log 



r PA(fc)fc" 



^Rik\PA,QB) g 

kA WA{k, k') logQB(fc, k'\pA) 



J2PA{k)k°^' log 



PA{k)k° 



S{k\pA, Qb 

(4.4) 



in which R{k\pA,QB) and S{k\pA,QB) are to be solved from 

PA{k)k''' p PA(fc)fc°"* 



i?(fc) = =r- 



S{k) = =r- 



(4.5) 



kA Ek'QBik, k'\pA)S{k') kA Ek'Qsik', k\pA)R{k') 

Hence, upon assembling and combining the various terms in (j4.2p and upon using 
relations such as (IA.9IA.10p and (IB.3P to simplify the result, we find 



Dab = - ^PA{k) log 



[PA{k) 



PB{k) 



T^^PBik) log 



PB{k) 



-kAj2WAik,k')log 



PAik) 
WA{k,k') 



k.k' 



fcB^W^s(fc,fc')l0g 



R{k\pA, QB)QB(k, k'\pA)S{k'\pA, Qb) 

WBik,k') 



k,k' 



R{k\pB,QA)QA{k, k'\pB)S{k'\pB,QA) 



(4.6) 
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According to (|B.3I) . the product WAB{k,k') = R{k\pA,QB)QB{k,k'\pA)S{k'\pA,QB) 
equals the joint distribution of in- and out- degrees of connected nodes in an 
ensemble of the family (|3.ip that would have been obtained upon choosing the 
hybrid combination {pa,Qb} of degree distribution and wiring kernel, where Qb 
is normalized according to k' PA{k)pA{k')QB{k, k'\pA) = 1- Similarly, the product 

WBA{k, k') = R{k\pB , Q a)Q A{k, k'\pB)S{k'\pB, Qa) would have been obtained for the 
ensemble {pb, Qa}- Thus we may write 



lim Dab 



2 ^ 

k 



PA{k) log 



PAik) 



PB{k) 



1 



^PB{k) log 



Psik) 



PAik) 



kAyWAik,k') log 



k,k' 
k.k' 



WA{k,k') 



WBik,k')log 



WAB{k,k') 

WBik,k') 



WBA{k,k') 



(4.7) 



This appealing formula shows that Dab > for all choices of {A, B), with equality if 
and only if Wa = Wb\ in the later case one automatically will have Wab — Wba — 
Wa = Wb ■ In the case where degree-degree correlations are absent from both networks 
one will find WOib(^, k') = WA{k, k') = WiA{k)W2A{k'), and formula reduces to 
the Jeffreys divergence between the degree distributions pA and ps- 



Jf..2. Practical form of the distance formula 

In contrast to W a and Wb , which correspond to the two given networks ca and cb , 
we cannot measure Wab and Wba the later would correspond to hypothetical hybrid 
networks. Hence in order to use ()4.7p in practice it will be convenient to write it in 
an alternative form: 



1 



lim Dab = t: VlpACfc) log 



PA{k) 



1 



-kAj2WA{k,k')log 



■PB{k) 
WA{k,k') 



^PBik) log 



PB{k) 



PA{k) 



k.k' 



WB{k,k') 



-kBY,WB{k,k')\og 



WB{k,k') 



k.k' 



WA{k,k') 



-kAY,^A{k.k')l0g 



WB{k,k') 



k,k' 



kBY,^B(k,k')log 



R(k\pA, QB)QB{k, k'\pA)S{k'\pA, Qb) 
WA{k,k') 



k,k' 



R{k\pB,QA)QA{k, k'\pB)S{k'\pB,Q, 



(4.8) 



If we choose Qa and Qb to be the canonical kernels for the two ensembles A and B, i.e. 
QAik,k'\p) = WA{k,k')/p{k)p{k') and QB{k,k'\p) = WB{k,k')/p{k)p{k'), expression 
simplifies to 



lim Dab = T;y^.PA{k)log 

N^oo I ^ — ' 

fe 

+ \kAY.WA{k,k')log 



PAik) 



1 



k,k' 



■PB{k) 

WA{k,k') 
WB{k,k') 



7^^PB{k) log 



PB{k) 



PA{k) 



--kBj2WB{k,k')log 



k,k' 



WBjk, k') 
WA{k,k') 
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+ 



^kA{j2wMk) log 

fc 

k 



PA{k) 



R(k\pA,QB) 

R{k\pB,QA) 



J2W2A{k')\og 

fc' 

^VK2i3(fc')l0g 



PA{k') 



S{k'\pA,QB) 
PB{k') 



with R{k\pA,QB) and S{k\pA,QB) to be solved from 

WiA{k) 



R{k)/pA{k) 



S{k)lpA{k') 



Y.j:,WB{k,k')[S{k')/pA{k')\ 



W2A(k) 



Y.k,WB{k',k)[R{k')lpA{k')] 



S{k'\pB,QA)- 

(4.9) 



(4.10) 
(4.11) 



Next we rewrite the arguments of the logarithms in the second line of (j4.8p in 
terms of the two degree correlation ratios n^(fc, k') — WAik, k') /WiA{k)W2A{k') and 
IlB{k,k') = WB{k,k')/WiB{k)W2B{k')- We also transform the order parameters 
R{k\pA,QB) and S{k\pA,QB) to new functions pABik) and UAB{k) via 

PA{k)WiA{k) PA{k)W2A{k) 



PAB{k) = 

R{k\pA,QB)WiB{k) 

Our distance then becomes 



(^AB{k) 



1 



lim Dab ^ -^^,PA{k) log 

N—^oo / ^ — ^ 



-kAY,WA{k,k')\0g 



PB{k) 
TlA{k,k'] 



■^PB{k) log 



Sik\pA,QB)W2Bik) 

PB{k)- 



(4.12) 



-fcB^M/B(fc,A?')l0g 



nB(k,k') 



k.k' 



k,k' 



UA{k,k' 



-nB{k,k' 

^kAj2 ^Mk) log PAS (fc) + ^ W2A{k) log fJABik) 

fc fe 

Jfcs ^ WiB{k) log pBA{k) + ^fcs ^ W^2i3(fc) logaBA(fc) (4.13) 



fc k 

in which pAB{k) and (TAB(fc) are to be solved from 

PABik) = Y.UB{k,k')W2A{k'yABik') 
fe' 

<JAB{k) = ;^ns(fc',fc)M^lA(fc')PAB(fc') 



(4.14) 



(4.15) 



Whenever pA = ps or 11^ — Hb (or both), the solution of (I4.14l4.15p will be 
pAB{k) = (JABik) — 1 for all fc. Hence the last two lines of (|4.13p represent corrections 
to the distance formula, that reflect interference between the constraints imposed by 
prescribed degree statistics and those imposed by presecribed degree correlationfQ. 

We note, finally, that although definition ()4.ip requires that the networks A and 
B have the same number of nodes, the final form ()4.13p of our formula does not depend 

X A similar interference term was erroneously omitted from [T], which can be confirmed by retracing 
the above arguments and the calculations in |Appendix A| for nondirected graphs. We will summarize 
and compare our results for directed and nondirected graphs below. 
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on the (relative) network sizes. Hence we will apply the result (j4.ip also to networks 
of different sizes, provided both are sufficiently large, which makes (j4.1[) more widely 
applicable to real networks (which will in general be large, but of different sizes). 



5. Tests, comparisons, and applications 



5.1. Simple special cases 



If the in-degrees are statistically independent of the out-degrees, i.e. p{k) — 
p(fc™)p(fc°"*), the entropy per node (I2.19P of the ensemble (|2.1I) with prescribed degree 
statistics but no degree correlations simplifies to 



5^fc log( ) + i 



p(fc-)log 



p{k" 



^p(fc°"*)log 



p(fc°"') 



^fc(fc°"*) 



+ c 



N 



(5.1) 



with limjv_i.oo Cn = 0. This, according to 1 , is the sum of the individual entropies of 
the 'out-graph' ensemble and the 'in-graph' ensemble, calculated as though they were 
considered as two separate undirected networks. In ensembles with degree correlations, 
i.e. p.l|) , with entropy per node p.9|) , the additional term that represents the entropy 
reduction imposed by the degree correlations does not simplify as a result of assuming 
p(k) = p{k™)p{k°'^*); the degree correlations can generate statistical relations between 
in- and out-degrees that are not visible in p{k). 

A regular directed graph is one where each node has the same in- and the same 
out-degree. Since for a well-defined directed graph, we also have ^^.p{k)k™ = 



k, any regular directed graph must have p{k) 



turn, implies also that W{k, k') — 5^ (kk)^' (kk)- is impossible to have degree 

correlations, and both equation (|2.19l) and P 91) reduce to 

S = k[ log(iVfc) - 1] - 2 log(fc!) + Cn (5.2) 



5.2. Comparison of formulae for undirected versus directed networks 

It is instructive to give an overview of the similarities and differences between directed 
and nondirected graphs. Instead of entropies per node, we will also compare entropic 
results in terms of complexities. The degree complexity per node Cdcg of a graph c 
is the difference between the entropy per node of the associated ensemble (|2.ip and 
the value >So [k] that is found for the entropy per node if only the average connectivity 
k is prescribed (i.e. for an ensemble with Poisson distributed degrees). The wiring 
complexity Cwir is the further entropy reduction that results if we go from the ensemble 
p.ip to the ensemble p.l|) where also the degree-degree correlations are imposed. Our 
results can then be summarized as in table [1] 

Similarly we can compare the formulae for the information-theoretic distance 
Dab between two networks ca and cb, for directed versus nondirected ones. This 
gives in both cases limTv-foo Dab — D'^g + -D^'^ -I- , where -D^°J is the direct 
contribution from degree distribution dissimilarity, i^^g is the direct contribution 
from degree-correlation dissimilarity, and D^^g accounts for the interference between 
degree statistics and the possible degree correlations that could be achieved. Our 
distance results can then be summarized in table [2] 
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directed graphs 



nondirected graphs 



So[k]: k[\og{N/k) + l] 



Cdeg[p]- ^p(fc)l0g 



pik) 



C^ir\p,W]: kJ2w{k,k')log 



7r^(fci")7rfe(fc°"t) 
W{k, k') 



k,k' 



Wi{k)W2{k') 



-k[\og{N/k) + 1] 



^p(fc)log 



p{k) 



^-kik) 



-kY,W{k,k') log 



k,k' 



W{k,k') - 
W{k)W{k'). 



Table 1 . Comparison of entropies and complexities of directed versus nondirected 
graphs. The entropy per node is given by S[p, W] = So[k] — Cdog[p] — Cwir[p, W], 
modulo finite size corrections. For ensembles in which only the average 
connectivity k is prescribed one would find the value Soffc]- The quantities 
Cdog[p] and CwirlPiW'] measure the entropy reductions caused by subsequently 
imposing a degree distribution p, and the joint distribution W of connected 
nodes, and can therefore be identified with the degree complexity and the 
wiring complexity of the typical graphs in our ensembles. In directed graphs 



*), where fcj"(c) = ^ ■ ^ij and fc°"*(c) = J2 '^ii' ^^"^ W{k,k') 



(Nk) ^ ^2ij '^^i^k k-^k' ■ nondirected graphs one has only fci(c) = 
and W(k,k') = (JVfc*)-! j2^J c.,4,fc.<5fc',fc,. 



The functions pAB{k) and aAsik) are solved from (|4.14l4.15p . Repeating the 
calculation for nondirected graphs shows that there only one function pAsik) is 
required (or equivalently, pab — o'ab), which is the solution of 

PAB{k)= Y.UB{k,k')WA{k')p^'s{k') (5.3) 

k' 

5.3. Application to gene regulation networks 

A gene regulation network can be viewed as a directed graph, where the nodes 
represent genes and the arcs indicate whether {cij = 1) or not {cij = 0) the protein 
synthesized from gene j acts as a regulator of gene i. In the present binary set-up, 
where Cij G {0, 1}, one disregards information on the nature of regulation, i.e. whether 
it involves repression or activation. 

In tables [3] and 2] we show the results of calculating the various contributions to 
the entropy of the ensemble associated with the networks of |9] and [11] respectively. 
Imposing only the correct average degree gives the entropy Sq [k] . Imposing in addition 
the correct degree distribution (i.e. representing the network by ensemble (|2.ip ) 
gives the entropy So[k] — Cdeg[p]- Imposing additionally the correct degree-degree 
correlations (i.e. representing the network by ensemble p.ip ) reduces the entropy still 
further to [k] - Cdcg [p] - C^ir [p,W]. 

In both tables we also show the entropies per arc, defined as S' — S/k. The latter 
are normalised for the average degree. This fits in with the 'arc centric' view that the 
calculations in this paper and its predecessor [1] seem to have steered us in, where the 
final answers are consistently found to be most elegantly formulated in terms of the 
joint distribution W of degrees at either end of an arc. 
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directed graphs 



nondirected graphs 



D 



dcg 
AB 



^AB 



pint 

^AB 



7^YPA{k)\0g 



PA{k) 



T^^PBik) log 



-kAY,WA{k,k')\0g 

k,k' 



PB{k) 
PB{k) 



PA{k) 
llA{k,k') 



Ilsik^k' 
TlB{k,k' 



Ha WAik, k') \og[pAB{k)aABik') 



k,k' 



PA{k) 



-^Ps(fc)log 



PB{k) 
PB{k) 



'-^kAY^A{k,k')\og 

k,k' 



PA{k) 
IiA{k,k') 



nB(fc,fc') 

nB(fc,fc' 



ifc^ ^ WA{k) log PAB [k) 



+ Y ^sik, k') log[pBA{k)aBA{k')] + ^b ^ Weik) log pBA{k) 



k.k' 



Table 2. Comparison of the contributions to the distance limjv->oo ^AS = 
^AB + ^AB + ^AB' between graphs CA and cg. Notation conventions are 
mostly as in the caption of table [T] The degree correlation ratios 11 are 
defined as U{k,k') = W {k,k')/Wi{k)W2{k') (for directed graphs) and U{k,k') = 
W{k, k')/W {k)W {k') (for nondirected graphs). The functions Pab{^) and 
(T^B(fc) (for directed graphs) are the solutions of equations 114. 1414. 151 1. The 
functions pABik) (for nondirected graphs) are to be solved from equation l|5.3|l . 



In [9] Hughes et al. used a two-color cDNA micro-array hybridization assay to 
generate expression profiles in yeast for 276 deletion mutants. We followed an approach 
published by Rung et al. [10' to construct a network from this data. Two genes gl, g2 
are connected by an arc from gl to g2 if the ratio of the expression level in the mutant 
where gene gl is deleted versus the background standard deviation in the wild-type 
strain is larger than a threshold. In this way, we arrived at a directed network with 
N — 5654 nodes (genes), with an average degree k w 5.6. The degree distribution of 
this network is characterised by high frequency of occurrence of low degree nodes; the 
set of nodes with out-degree zero and in-degree less than 4 covers more than 50% of 
the set. However, the network also contains some nodes with very high out-degree. 

The authors of [11 , Harbison et al. reported on a study of DNA binding 
transcriptional regulators in yeast. For each of the 203 transcription factors tested 
they report the genes where the transcription factor bound to the putative promoter 
region. Similar to a previous study |12| we constructed a network by connecting 
gene gl, which encodes a transcription factor, to gene g2 if the measurements were 
statistically significant {P < 0.001). Their data were represented as a directed network 
of = 3865 nodes, with an average degree of fc w 2.81. Compared with the data of [9], 
the network of |llj is more sparse. It does, however, show a similar degree distribution 
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Gene regulation network of Hughes et al. (2000) 



Imposed topological property 


Entropy per node 


Entropy per arc 


average degree k 


44.5 


7.9 


degree distribution p{k) 


19.5 


3.5 


degree-degree correlations Ii{k, k') 


17.9 


3.2 



Table 3. The tailoring of random graph ensembles by imposing as constraints the 
values of increasingly prescriptive macroscopic topological features measured in 
the gene regulation network of [9] . This tailoring reduces the entropy per node S in 
the ensemble in stages, and thereby the effective number of graphs J\f = exp[A'^5] 
compatible with the network of [5]. We observe that, in this example, refining 
the tailoring of the graph ensemble from imposing only the correct average degree 
to imposing the correct degree distribution is more significant than the further 
refinement of imposing the correct degree-degree correlations. Hence the degree 
complexity of this network is significantly larger than the wiring complexity. 

pattern - in fact over 50% of the nodes have zero out-degree and an in-degree of less 
than 2. 



Gene regulation network of Harbison et al. ( 2004 ) 



Imposed topological property 


Entropy per node 


Entropy per arc 


average degree k 


23.2 


8.2 


degree distribution p{k) 


12.8 


4.5 


degree-degree correlations Il{k, k') 


11.6 


4.1 



Table 4. The tailoring of random graph ensembles by imposing as constraints 
the values of increasingly prescriptive macroscopic topological features measured 
in the gene regulation network of The tailoring reduces the entropy per 

node S in the ensemble in stages, and thereby the effective number of graphs 
Af = exp[A'^S'] compatible with the network of As in the previous example, 

refining the tailoring of the graph ensemble from imposing only the correct average 
degree to imposing the correct degree distribution is more significant than the 
further refinement of imposing the correct degree-degree correlations. Hence the 
degree complexity of this network is again significantly larger than the wiring 
complexity. 

In practice, when the gene network data are collected, a decision has to be made 
about the cut-off point where the effect of one gene product on another gene is so 
small as to be considered insignificant. If there was no threshold and every small 
fluctuation was taken to be evidence of co-regulation, then it would appear that every 
gene regulated every other gene, and the network would be complete. Conversely, 
setting too strict a threshold will risk missing out on important but subtle interactions. 

Changing the threshold would reduce the number of arcs, and hence make the 
network more sparse with lower average degree. Our base assumption would be that 
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beyond that, the main quahtative features of the topology would be maintained. 
That is, the stricter threshold would remove arcs indiscriminately across the network. 
However, it is possible that, for example, a node would appear to be a 'hub' under a 
lenient criterion, but would lose a large number of interactions under stricter criteria, 
so that it is no longer a hub: this would be a qualitative change to the topology arising 
from the change in thresholds. The analysis proposed in this paper is measuring the 
topological properties of the network (rather than the network itself). We would 
expect these results to vary insofar as the topological properties varied. Figure [1] 
shows the results of repeating the analysis above for different values of the thresholds. 




Figure 1. Each bar on the chart represents a different choice of threshold. 
Moving from left to right, the threshold is made progressively stricter so as 
to exclude approximately 3 percent of arcs at each step. The left half refers 
to Harbison et al. Ill| data; the right half refers to Hughes et al. [9] data. 
Within a bar, the top line presents the entropy per bond when the constraint is 
'average degree'; the next line shows the entropy per bond when the constraint is 
additionally 'degree distribution'; and, the final line gives the entropy per bond 
for the ensemble additionally targeting the 'degree-degree correlation'. Hence the 
top two shaded areas represent the degree complexity and the wiring complexity 
respectively. Both datasets are plotted on the same axis in order to illustrate that, 
although there is some movement with different thresholds, the results for the two 
different networks remain distinct and distinguishable for any reasonable choice 
of threshold, and are not unduly sensitive to any reasonable choice of threshold. 

The above data all refer to the same organism, yeast; however, they present 
different aspects of gene interactions. Hence, even more than for protein-protein 
interaction networks, comparison must be done cautiously. The heterogeneity in the 
data sets emphasises the importance of developing a suite of tools and measures that 
can be used to study each network independently. 

6. Discussion 

In this paper we have derived several mathematical results for directed random 
graph ensembles tailored to match chosen properties of real-world networks. We 
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have calculated the Shannon entropy of ensembles constrained by a prescribed 
degree distribution, and of ensembles constrained by a prescribed degree-degree 
correlation function (which contains more detailed topological information than the 
degree distribution). We have also defined a rational information-theoretic distance 
measurement for comparing networks based on their degree distribution and degree- 
degree correlation. All this complements and generalises earlier work done in [I] for 
nondirected networks. We also identified a correction term to the distance measure 
of nondirected graphs which was absent in |1]. A summary of our results and how 
they compare with the corresponding formula for nondirected networks is presented 
in tables [T] and [2] 

Our growing suite of quantitative tools can be used to study the properties of large 
real world networks. These tools are precise in leading order in N, and take the form of 
explicit and transparent formulae which use easily measurable macroscopic parameters 
as input. The present generalization to nondirected networks enables their application 
to gene regulation networks. We trust that the benefits of having explicit formulae for 
network complexities and information-theoretic dissimilarity measures will increase, 
especially in bioinformatics, as we gain experience with using and interpreting the 
method, and as we increase the range of topological properties to which we can tailor 
our graph ensembles. 

The focus of our future work will be to increase the number of topological 
properties that we can characterise, measure, and impose upon tailored random graph 
ensembles. Significant progress has already been made towards including distributions 
of so-called generalised degrees, but our priority will be to focus on observables that 
measure the statistics of short loops. In the presence of such loops the methods and 
ideas that we applied so far will no longer suffice. However, short loops appear to be 
key biological motifs, so progress in this direction should yield substantial benefits in 
terms of applicability of the method in biological signalling. 

References 

[1] Annibale A, Coolen A C C, Fernandes L P, Fraternali F and Kleinjung J 2009 J. Phys. A 
42{48):485001 

[2] Coolen A C C, Martino A D and Annibale A 2009 J. Stat. Phys. 136:1035-1067 
[3] Fernandes L P, Annibale A, Kleinjung J, Coolen A C C and Fraternali F 2010 PLoS ONE 
5(8):el2083 

[4] Coolen A C C, Fraternali F, Annibale A, Fernandes L and Kleinjung J 2011 Handbook of 

Statistical Systems Biology (in press). Wiley 
[5] Memisevic V, Milenkovic T and Przulj N 2010 Journal of Integrative Bioinformatics 7(3):120 
[6] Albert R and Barabasi A L 2002 Rev. Mod. Phys. 74{l):47-97 

[7] Dorogovtsev S N, Goltsev A V and Mendes J F F 2008 Rev. Mod. Phys. 80{4):1275-1335 
[8] Bianconi G, Coolen A C C and Vicente C J P 2008 Phys. Rev. E 78:016114 
[9] Hughes TR et al. 2000 Cell 102(1) 
[10] Rung J, Schlitt T, Brazma A, Freivalds K and Vilo J 2002 Bioinformatics (Oxford, England) 
18{Suppl 2) 

[11] Harbison C T et al. 2004 Nature 431:99-104 

[12] Schlitt T, PaUn K, Rung J, Dietmann S, Lappe M, Ukkonen E and Brazma A 2003 Genome 
Res 13(12):2568-2576 



Tailored graph ensembles as proxies or null models for real networks 17 

Appendix A. Order parameter representation of the graph probabilities 

In this section we derive a tool that is repeatedly used in this paper, being a formula 
in terms of simple observables and order parameters of the log-probability per node 
of graphs (|3.5p given the ensemble definition p.ip . in leading orders in N. Upon 
substituting p.ll) into this formula, and after some simple manipulations and use of 
the law of large numbers, one finds 

r!(c|p,g)= ^p(fc|c)logp(fc) + 0i(c|Q)-02(c|Q) + ew (A.l) 



!)i(c|Q) = —logw{c\ki,...,kN,Q) 



Mc\Q) ^ —logZ{ki,. . . ,kN,Q) 



ki = ki(C) Vi 
ki=ki(C) Vi 



with e^v — as — cxd, and 

Z{ki,...,kN,Q) = ^w{c\ki, . . . ,kN,Q)Y[6j:.^j:^(^c) 



(A.2) 
(A.3) 

(A.4) 



'w{c\ki,...,kN,Q) = Yl [j^Qiki,kj\p)6c,j^i+(l~^Q{h,kj\p)^Sc,^fi (A.5) 

In these expressions k = N^'^Y.rK'^ = N'^T^t^r^ Pi^) = N^'^Y.thk,^ the 
kernel Q{., .) is normalized locally according to J2k k' Pik)Pik')Qikj k'\p) = 1. 

Appendix A.l. Calculation of 4>i 

The first contribution (IA.2I) to the entropy is calculated easily: 



k, = k,{C) Vi N 



= fc(c){log[^]-l+5]W^(fc,fc'|c)logQ(fc,fc'|p(.|c))}+0(l)(A.6) 



k,k' 



It involves the in- and out degree distribution p{k\c), its degree average fc(c), and 
the joint distribution W{k,k'\c) of in- and out degrees of connected nodes. All are 
calculated for the graph c and defined as 



i 

1 

Nk{c) 



(c) 



They are related via the two identities 



k" 



w^ik\c) = J2wik,k'\c)^ irTP(^lc) 

k' 



W2{k\c) ^J2w{k',k\c) 



k° 



He 



-p{k\c) 



(A.7) 
(A.8) 

(A.9) 
(A.IO) 



The kernel in (|A.6p is normalized according to p p{k\c)p{k' \c)Q(k, fc'|p(.|c)) = 1 
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Appendix A. 2. Calculation o/ 02 

In order to calculate (|A.3I) we first work out the following quantity, which will then 
have to be evaluated at . . . , k^) — (fci(c), . . . , kj\i{c)): 

4>2{ki, ■ ■ ■ ^k^lQ) = -^\ogZ{ki,. . . ,kN,Q) 

i 

= ^l°g£n [^e'[-^-"-^'^-'^-°"'l]i(-,^|p,Q) (A.ll) 



with 



Liu:,^P\p,Q) = H [l + -g(fc„fc^.|p)[e-(-.+V',)_i]" 

= exp[A^Q(fc„%|p)[e-'(-'+^^)-l] + 0(7V")] (A.12) 

Upon introducing R{k\u:) = N-'^ J^r ^.fc.e"'"' and S'(fc|i/') = A^-^ E» %,fc,e-''''% and 

inserting /Hj: [di?(fc)dS'(fc) 6[R{k)~R{k\u:)]S[S{k)-S{k\xp)]] with (5-functions written 
in integral form, we can write 



L{ij,'ip\p,Q)= yn[ 



di?(fc)di?(fc)d5'(fc)d5(fc) .jv,j^(g^j^(g,_^g(g,g(g)1 
47r2/iV2 



Substituting this back into 02, and using the law of large numbers, then gives 

02(. . .) ^ llog J JJ [di?(fc)di?(fc)d5(fc)d^(fc)]e^*[«'^'^'^|P"'«]+°('°sA^) (A.14) 

where 

-R, 5, Q] = iY^[R{k)R{k) + S{k)S{k)] + fc ^ i?(fc)Q(fc, fc'|p)5'(A?') - k 

+ X^p(fc){ logy"^e'[-'='"'«(^')<=""l + logy"^e'['/"=°"^^(^^)<=""'l} 

(A.15) 

After doing the remaining integrals over uj and we get 

R, S, S\p, Q] = iY^[R{k)R{k) + S{k)S{k)] + k^R{k)Q{k, k'\p)S{k') - 'fc 

+ ^p(fc)fc'" log[-i7?(fc)] + ^p(fc)fc°"' log[-iS'(fc)] 

k k 

-^p(^)log(fc'"!/fc°"*!) (A.16) 
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For N ^ oo the quantity 02(^1; ^wlQ) can be evaluated by steepest descent, 
giving limTv-i-oo 4'2{- ■ ■) = cxtr^ ^ s'^iR: R: S, S\p, Q]. Differentiation of ^I' gives tfie 
following saddle-point equations: 



- ii?(fc) = p{k)k''' / R{k) ^kJ2 Qik, k'\p)S{k') 

k' 

~iS{k) = p{k)k°''^/S{k) = fc^ g(fc', k\p)R{k') 

k' 

At the saddle-point we deduce that k' R{k)Q{k, k'\p)S{k') = 1, and that 
■^[R, R, 5, S\p, Q]= -2k- ^p(^) log(fc'"!A:°"*!) 



(A.17) 
(A.18) 



+ 5^p(fc)fc'"log 



k 

p{k)k''' 

R{k\P.Q) 



Y,p{k)k°^Hog 



p{k)k° 



S{k\p,Q) 



in which the functions R{k\p, Q) and S{k\p, Q) are the solutions of 

p{k)k°''^ 



R{k) 



S{k) 



(A.19) 



(A.20) 



kEp Q{k,k'\p)S{k') kj:j:, Q{k',k\p)R{k') 

Finally, the quantity (|A.3p we aim to calculate is defined as the value of ^2(- • •) upon 
substituting (fci, . . . , k'^) (fci(c), . . . , k]\[{c)). The only occurrences of the sequence 
{ki, . . . , fcAr) in the formula (jA.19|) are in the values oip{k) and k, so we obtain (p2{c\Q) 
by making in (|A.19I) the substitutions p{k) — p{k\c) and k — k{c). We conclude that 

Mc\Q)^ -2fc-^p(^)log(fc-!A:°"*!) 



^p(fc)fc-log 



pik)k" 



R{k\p,Q) 



^p(fc)fc™nog 



p{k)k° 



S{k\p,Q) 



(A.21) 



in which p{k) — p{k\c) and k ~ k{c), and in which R[k\p,Q) and S{k\p,Q) are the 
solutions of 



Rik) 



p{k)k'' 



kJ2k.Q{k,k'\p)S{k')' 



S{k) 



p{k)k° 



kJ2k-Q{k',k\p)R{k') 



(A.22) 



Appendix A. 3. Final analytical expression for VL 

The intermediate results (|A.6|A.2ip can now be substituted back into expression (|A.ip . 
which gives a formula that is seen to depend on c only via W{k^ k'\c) and p{k\c): 

n{c\p,Q) ^ [Y,p{k) logp(fc) + fc[l + log[^/iV]] +^p(fc) log(A:'"!fc°"'!) 



5]p(fc)fc-log 



p{k)k'' 



-Rik\p,Q) 
~kY,W{k,k')logQ{k,k'\p) I 

■' W(k,k' 



log 



p{k)k° 



S{k\p,Q) 



k,k' 



) = Wik,k'\C), p{k)=p{k\C) 

+ eN (A.23) 
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with limjv-i-oo ^Af = 0, fc = X]fe^'"p(^) = X)*; ^^'^ ^it^ two functions 

S{k\p,Q) and R{k\p,Q) to be extracted from p. 71) . 

Appendix B. Calculation of the kernel W 

For large N the kernel W{k,k') — {Nk}^^ J^ij '^ij^kk^k' k- ^^^^ self-averaging in 
the ensemble p.ip . i.e. with probability one any graph generated randomly according 
to (|3.ip will exhibit the same kernel, modulo finite size effects. Thus we may for 
TV — > oo calculate W{k, k') as an average over the ensemble p.ip : 

k 



n lp)^c., ,1 + (i - j^Qih, tj |p)) J,.^ ,0 



X 



xQ(fc,,4|p)[e-(-'-+^=)[l + 0(l)] 



X 



ki...k 



t Z{ki . ..kN,Q) 



r s 
[^ + Oii)m^Pih) frj [dR{q)dR{q)dS{q)dS{q) 



2 



ki...kN H 

X i?(fc)^(fc') n Z'' r^e-^'"+''^'=°"-ifl.(feOe--iS(fc.)e-'^l (B.l) 
J-^J- L 47r^ J 

We now write Z(fci . . . fc^r, Q) also as an integral over order parameters, as in our 
earlier derivation of (jA.19|) . but noting that now the relevant degree distribution is 
that of our ensemble p.ip . i.e. p{k) instead oip{k). This gives 



W{Kk') = [l+0{^)]Q{kS') Y._^X{p{h) 

fcl...fcjV 



(B.2) 

JUqdRiq)dR{q)dS{q)dS{q) e^*[«.«.S,Sb,Q]+0(log JV) 

where the non-extensive terms in the exponentials of numerator and denominator are 
fully identical, and with ^E* as defined in (lA.lSp , modulo the replacement p — > p. The 
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summation over degree sequences has now become obsolete, and for A'' ^ oo we obtain 
lim W{k, k') = R{k\p, Q)Q{k, k'\p)S{k'\p, Q) (B.3) 



N 



in which R{k\p, Q) and S{k\p^ Q) are to be solved from 

Rik) = = mS^^, Sik) = _ P^'K' . (B.4) 

kj:j:,Qik,k'\p)S{k') kj:j:,Q{k',k\p)Rik') ' 

with the average degree of our ensemble, k = '^^k™p{k) = '^f:k°^^p(k). 



